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(54) CDMA transceiver techniques for multiple input multiple output (mimo) wireless 
communications 



(57) The present invention is related to a method for 
multi-user wireless transmission of data signals in a 
communication system having at least one base station 
and at least one terminal. It comprises, for a plurality of 
users, the following steps : 

adding robustness to frequency-selective fading to 

the data to be transmitted. 

performing spreading and scrambling of at least a 



portion of a block of data, obtainable by grouping 
data symbols by demultiplexing using a serial-to- 
parallel operation, 

combining (summing) spread and scrambled por- 
tions of the blocks of at least two users, 
adding transmit redundancy to the combined 
spread and scrambled portions, and 
transmitting the combined spread and scrambled 
portions with transmit redundancy. 
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Description 

Field of the invention 

5 [0001] The present invention is related to a method for Wideband Code Division Multiple Access (WCDMA) wireless 
communication systems, suitable for communication over frequency-selective fading channels. 

introduction to the state of the art 

10 [0002] Wideband CDMA is emerging as the predominant wireless access mode for forthcoming 3G systems, because 
it offers higher data rates and supports larger number of users over mobile wireless channels compared to access 
techniques like TDMA and narrowband CDMA. Especially in the downlink (from base to mobile station) direction the 
main drivers toward future broadband cellular systems require higher data rates. There are several main challenges 
to successful transceiver design. First, for increasing data rates, the underlying multi-path channels become more 

15 time-dispersive, causing Inter-Symbo! Interference (ISt) and Inter-Chip Interference (ICI), or equivalently frequency- 
selective fading. Second, due to the increasing success of future broadband services, more users will try to access 
the common network resources, causing Multi-User Interference (MUI). Both ISI/ICI and MUI are important performance 
limiting factors for future broadband cellular systems, because they determine their capabilities in dealing with high 1 
data rates and system loads, respectively. Third, cost, size and power consumption issues put severe constraints on 

20 the receiver complexity at the mobile. 

[0003] Direct-Sequence (DS) Code Division Multiple Access (CDMA) has emerged as the predominant air interface 
technology for the 3G cellular standard, because it increases capacity and facilitates network planning in a cellular 
system. DS-CDMA relies on the orthogonality of the spreading codes to separate the different user signals. However, 
ICI destroys the orthogonality among users, giving rise to MUI. Since the MUI is essentially caused by the multi-path 

25 channel, linear chip-level equalization, combined with correlation with the desired user's spreading code, allows to 
suppress the MUI. However, chip equalizer receivers suppress MUI only statistically, and require multiple receive an- 
tennas to cope with the effects caused by deep channel fades. 

[0004] Multiple Input Multiple Output (Ml MO) systems with several transmit and several receive antennas are able 
to realize a capacity increase in rich scattering environments. Space-Time coding is an important class of MIMO com- 
30 munication techniques that achieve a high quality-of-service over frequency-flat fading channels by introducing both 
temporal and spatial correlation between the transmitted signals. It has already been combined with single-carrier 
block transmission to achieve maximum diversity gains over frequency-selective fading. Up till now however focus was 
mainly on single-user point-to-point communication links. 

35 Aims of the invention 

[0005] The present invention aims to provide a method and device for Wideband Code Division Multiple Access 
(WCDMA) wireless communication that preserves the orthogonality among users and guarantees symbol detection 
regardless of the frequency-selective fading channels. 

40 

Summary of the invention 

[0006] The invention relates to a method for multi-user wireless transmission of data signals in a communication 
system having at least one base station and at least one terminal. It comprises, for a plurality of users, the following 
45 steps : 

adding robustness to frequency-selective fading to said data to be transmitted, 

performing spreading and scrambling of at least a portion of a block of data, obtainable by grouping data symbols 
by demultiplexing using a serial-to-parallel operation, 
so - combining (summing) spread and scrambled portions of said blocks of at least two users, 
- adding transmit redundancy to said combined spread and scrambled portions, and 
transmitting said combined spread and scrambled portions with transmit redundancy. 

[0007] Preferably the spreading and scrambling operation is performed by a code sequence, obtained by multiplying 
55 a user(terminal)-specific code and a base station specific scrambling code. 

[0008] Preceding the steps mentioned above, the step can be performed of generating a plurality of independent 
block portions. 

[0009] The method can also start with the step of generating block portions. 
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[0010] Advantageously all the steps are performed as many times as there are block portions, thereby generating 
streams comprising a plurality of combined spread and scrambled block portions. 

[001 1] In a specific embodiment, between the step of combining and the step of transmitting said spread and scram- 
bled portions, the step is comprised of encoding each of said streams. 
5 [001 2] More specifically, the step is comprised of space-time encoding said streams, thereby combining info from at 
least two of said streams. 

[001 3] Even more specifically the step of space-time encoding the streams is performed by block space-time encod- 
ing or trellis space-time encoding. 

[0014] In an alternative embodiment, the step of inverse subband processing is comprised between the step of 
10 combining and the step of transmitting the spread and scrambled portions. 

[0015] In an advantageous embodiment the step of adding robustness to frequency-selective fading is performed 
by adding linear precoding. 

[0016] Alternatively, the step of adding robustness to frequency-selective fading is performed by applying adaptive 
loading per user. 

is [0017] In a typical embodiment the step of combining spread and scrambled block portions includes the summing 
of a pilot signal. 

[001 8] In the method of the invention the step of adding transmit redundancy comprises the addition of a cyclic prefix, 
a zero postfix or a symbol postfix. 

[0019] The invention also relates to a transmit system device for wireless multi-user communication, applying the 
20 method here described. 

[0020] Another object of the invention relates to a transmit apparatus for wireless multi-user communication, 
comprising : 

Circuitry for grouping data symbols to be transmitted, 
25 - Means for applying a spreading and scrambling operation to said grouped data symbols, 
Circuitry for add transmit redundancy to said spread and scrambled grouped data symbols, 
At least one transmit antenna for transmitting said spread and scrambled grouped data symbols with transmit 
redundancy. 

30 [0021] In a specific embodiment the transmit apparatus also comprises means for adding robustness to frequency- 
selective fading to the grouped data symbols. 

[0022] In a preferred embodiment the transmit apparatus also comprises a space-time encoder. 
[0023] In another preferred embodiment the transmit apparatus also comprises circuits for inverse subband process- 
ing. 

35 [0024] The invention also relates to a method for receiving at least one signal in a multi-user wireless communication 
system having at least one base station and at least one terminal, comprising the steps of 

Receiving a signal from at least one antenna, 
Subband processing of a version of said received signal, 
so - Separating the contributions of the various users in said received signal, 

Exploiting the additional robustness to frequency-selective fading property of said received signal. 

[0025] In a particular embodiment the step of separating the contributions consists in first filtering at chip rate at least 
a portion of the subband processed version of said received signal and then despreading. 
45 [0026] In another particular embodiment the step of separating the contributions consists in first despreading and 
then filtering at least a portion of the subband processed version of said received signal. 

[0027] In a typical embodiment the step of receiving a signal is performed for a plurality of antennas, thereby gen- 
erating data streams and wherein the step of subband processing is performed on each of said data streams, yielding 
a subband processed version of said received signal. 
so [0028] In a specific embodiment the additional step of space-time decoding is performed on each of the streams. 
[0029] To be even more precise the step of space-time decoding can be performed by block decoding or trellis 
decoding. 

[0030] In another embodiment the additional step of inverse subband processing is performed on at least one filtered, 
subband processed version of the received signal. 
55 [0031] Preferably the step of filtering is carried out by a filter of which the coefficients are determined in a semi-blind 
fashion or in a training-based way. 

[0032] In another embodiment the step of filtering is carried out by a filter of which the filter coefficients are determined 
without channel estimation. 
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[0033] Advantageously the step of filtering at chip rate is carried out by a filter of which the filter coefficients are 
determined such that one version of the filtered signal is as close as possible to a version of the pilot symbol. 
[0034] More in particular, the version of the filtered signal is the filtered signal after despreading with a composite 
code of the base station specific scrambling code and the pilot code and wherein the version of the pilot symbol is the 

s pilot symbol itself, put in per tone ordering. 

[0035] In another particular embodiment the version of the filtered signal is the filtered signal after projecting on the 
orthogonal complement on the subspace spanned by the composite codes of the base station specific scrambling code 
and the user specific codes. The version of the pilot symbol is the pilot symbol spread with a composite code of the 
base station specific scrambling code and the pilot code, and put in per tone ordering. 

w [0036] Typically, the additional step of removing transmit redundancy is performed. 

[0037] In a particular embodiment the additional robustness to fading is exploited by linear de-precoding. 

[0038] The invention also relates to a receive system device for wireless multi-user communication, applying the 

method as described above. 

[0039] Another object of the invention relates to a receiver apparatus for wireless multi-user communication, 
is comprising : 

A plurality of antennas receiving signals, 

A plurality of circuits adapted for subband processing of said received signals, 

Circuitry being adapted for determining by despreading an estimate of subband processed symbols received by 
20 at least one user. 

[0040] In an embodiment the circuitry adapted for determining an estimate of symbols comprises a plurality of circuits 
for inverse subband processing. 

[0041] In a specific embodiment the circuitry adapted for determining an estimate of symbols further comprises a 
25 plurality of filters to filter at least a portion of a subband processed version of said received signals. 
[0042] Even more specifically the filtering is performed at chip rate. 
[0043] Finally, the apparatus further comprises a space-time decoder. 

Short description of the drawings 

30 

[0044] 

Fig. 1 represents a telecommunication system in a single-cell configuration. 

Fig. 2 represents a telecommunication system in a multiple-cell configuration. 
35 Fig. 3 represents a block diagram of a receiver structure. 

Fig. 4 represents the Multi-Carrier Block-Spread CDMA downlink transmission scheme. 

Fig. 5 represents the MUl-resilrent MCBS-CDMA downlink reception scheme. 

Fig. 6 represents the Space-Time Block Coded MCBS-CDMA downlink transmission scheme. 

Fig. 7 represents the MUl-resilient STBC/M CBS-CDMA MIMO reception scheme. 

Fig. 8 represents the transmitter model for Space-Time Coded MC-DS-CDMA with Linear Precoding. 

Fig. 9 represents the receiver model for Space-Time Coded MC-DS-CDMA with Linear Precoding. 

Fig. 10 represents the transmitter model for Space-Time Coded SCBT-DS-CDMA. 

Fig. 11 represents the receiver model for Space-Time Coded SCBT-DS-CDMA. 

Fig. 12 represents the base station transmitter model for SCBT-DS-CDMA with KSP. 
45 Fig. 1 3 represents the mobile station receiver model for SCBT-DS-CDMA with KSP. 

Detailed description of the invention 

[0045] In the invention methods for W-CDMA wireless communication between devices and the related devices are 
50 presented (Fig 1). In the communication system at least data (10) is transmitted from at least one base station (100) 
to at least one terminal (200). The communication method is extendable to a case with a plurality of base stations (Fig 
2, 100, 101), each base station being designed for covering a single cell (Fig 2, 20, 21) around such base station. In 
such multiple base station and hence multicell case a terminal receives typically signals from both the most nearby 
base station and other base stations. Within the method it is assumed that the base station has at least one antenna 
55 (1 1 0) and the terminal also has at least one physical antenna (21 0). The communication between the base station(s) 
and the terminal is designed such that said communication is operable in a context with multiple terminals. Hence it is 
assumed that substantially simultaneous communication between said base station(s) and a plurality of terminals is 
taking place, while still being able to distinguish at the terminal side which information was intended to be transmitted 
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to a dedicated terminal. 

[0046] The notion of a user is introduced. It is assumed that with each terminal in such a multi-terminal context at 
least one user is associated. The invented communication method and related devices exploit spreading with orthog- 
onal codes as method for separating information streams being associated with different users. Hence at the base 

5 station side information, more in particular data symbols, of different users, hence denoted user specific data symbols 
are available. After spreading spread user specific data symbols are obtained. These spread user specific data symbols 
are added, leading to a sum signal of spread user specific data symbols. Further additional scrambling of said sum 
signal is performed by a scrambling code being base station specific. Symbols obtained after spreading or spreading 
and scrambling, and summing are denoted chip symbols. In the invented communication method blocks with a plurality 

10 of said chip symbols are transmitted. In a single base station case the transmitted signal thus comprises a plurality of 
time overlapped coded signals, each coded signal being associated to an individual user and distinguishable only by 
a user specific encoding, based on the user signature or spreading codes. In a multiple base station context, the 
distinguishing also exploits said base station specific code. Further such blocks have at least one pilot symbol (also 
called training sequence), being predetermined and known at both sides of the transmission link. 

is [0047] In many embodiments of the invented method the availability of a receiver being capable of generating at 
least two independent signals from a received signal is foreseen. Said receiver receives a spread-spectrum signal, 
corresponding to a superposition of the signals of all users active in the communication system or link, more in particular 
said superposition of signals is channel distorted. Said generation of at least two independent signals can be obtained 
by having at least two antennas at the terminal, each independent signal being the signal received at such antenna 

20 after the typical down-converting and filtering steps. 

[0048] Recall that the invention exploits spreading with orthogonal codes for separating different users. Unfortunately 
said channel distortion is destroying the orthogonality of the used codes, leading to a bad separation. This problem is 
known as multi-user interference (MUI). Hence it is a requirement for the method of the invention to allow retrieval of 
a desired user's symbol sequence from a received signal transmitted in a communication context with severe multi- 

25 user interference. An additional aid in achieving this goal can come from the use of transmit redundancy. Applying 
transmit redundancy helps to remove or at least to weaken the effect of the time dispersion of the multi-path channel. 
A well known example of this is the addition of a cyclic prefix in a multi-carrier system. The method of the invention 
comprises a step of inputting or receiving said received signal, being a channel distorted version of a transmitted signal 
comprising a plurality of user data symbol sequences, each being encoded with a known, user specific code. 

30 [0049] The multi-channel propagation also gives rise to multipath fading, which generally exhibits both frequency- 
selectivity and time-selectivity. The phenomenon can give rise to serious performance degradation and constitutes a 
bottleneck for higher data rates. Frequency-selective fading can be tackled in several ways, as is discussed below. 
[0050] In the method of the invention the multi-user interference is suppressed by performing operations on the chip 
symbols. This multi-user interference suppression is obtained by combining said independent signals resulting in a 

35 combined filtered signal. In an embodiment of the invention said combining, also denoted chip-level equalization, is a 
linear space-time combining. For said combining a combiner filter (chip-level equalizer) is used. The (chip-level equal- 
ization) filter coefficients of said combiner filter are determined directly from said independent signals, hence without 
estimating the channel characteristics. One can state that from said independent signals in a direct and deterministic 
way a chip-level equalization filter is determined. Said chip-level equalization filter is such that said transmitted signal 

40 is retrieved when applying said filter to said received signal. 

[0051] In the approach of the invention ail system parameters are chosen such that the orthogonality between the 
various users is maintained, i.e. the MUI is combated in the most efficient way. To obtain that goal orthogonal user 
specific spreading codes are used, a block spreading operation is applied to the symbols and transmit redundancy is 
added such that the time dispersion is absorbed sufficiently. In this way the various users can properly be decoupled. 

^5 A block spreading operation is realized by converting a symbol sequence into blocks by a serial-to-parallel conversion. 
In order to enhance each user's robustness against deep fading effects one additionally applies techniques like linear 
precoding or adaptive loading. 

[0052] In case channel state information is available at the transmitter, e.g., for stationary or low-speed users, mul- 
ticarrier transmission allows to apply adaptive loading to exploit the inherent frequency diversity of the channel without 

so adding extra redundancy. Since the different users are perfectly decoupled, adaptive loading can be performed on a 
per user basis, such that for every user the optimal transmit spectrum can be obtained without having to bother about 
the presence of other users. In specific, adaptive loading assigns more information (through higher order constellations) 
and more power to the good subcarriers (with a high channel gain) while less information (through lower order con- 
stellations) and less power, or even none at all, is assigned to the bad subcarriers (with a low channel gain). 

55 [0053] In case no channel state information is available at the transmitter, e.g., for medium- to highspeed users, 
linear precoding can be applied to robustify the transmission against frequency-selective fading. Specifically, at the 
transmitter, the information symbols are linearly precoded on a subset of subcarriers, while adding some well-defined 
redundancy, such that the information symbols can always be uniquely recovered from this subset, even if some of 
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the subcarriers are in a deep channel fade. At the receiver, the available frequency diversity is then exploited by per- 
forming either joint or separate equalization and decoding. 

[0054] Figure 3 shows a general scheme of a receiver system. The elements presented there are used in various 
combinations in the embodiments described below. One or preferably several antennas (210) are foreseen for receiving 
signals. Next circuits (400) are provided to apply a subband processing to said received signals. The subband proc- 
essed receive signals are then applied to a block (500) adapted for determining an estimate of the data symbols of at 
least one user. Hereby use can be made of several other functional blocks, performing the tasks of inverse subband 
processing (510), removing robustness added to the transmitted symbols in order to provide protection against fre- 
quency-selective fading (520), filtering (530) and/or despreading (540). Subband processing of a data signal having a 
data rate, comprises in principle of splitting said data signal in a plurality of data signals, with a lower data rate and 
modulating each of said plurality of data signals with another carrier. Said carriers are preferably orthogonal. In an 
embodiment said subband processing of a data signal can be realized by using serial-to-parallel converters and using 
a transformation on a group of data samples of said data signal. Which operations are effectively applied and in which 
order highly depends on the specific embodiment. More details are given below. 

[0055] In a first embodiment of the invention a multi-carrier block-spread CDMA transceiver is disclosed that pre- 
serves the orthogonality between users and guarantees symbol detection. In this approach the M user data symbol 
sequences are transformed into a multi-user chip sequence. Apart from the multiplexing and the inverse subband 
processing three major operations are performed : linear precoding, block spreading and adding transmit redundancy. 
Each user's data symbol sequence is converted into blocks and spread with a user specific composite code sequence 
being the multiplication of an orthogonal spreading code specific to the user and a base station specific scrambling 
code. The chip block sequences of other users are added and the resulting sum is IFFT transformed to the time domain. 
Then transmit redundancy is added to the chip blocks to cope with the time-dispersive effect of multi-path propagation. 
The sequence that comes out is then transmitted. At the receiver side perfect synchronization is assumed. In the mobile 
station of interest the operations corresponding to those at the transmitter side are performed. The added transmit 
redundancy is removed and a FFT operation is performed. The FFT output is despreaded with the desired user's 
composite code sequence. This operation decouples the various users in the system, i.e. all MUI is succesfully elim- 
inated. For each individual user an equalization filter is then provided. The equalizer filters can be designed for jointly 
equalizing and decoding or for separately performing said operations. 

[0056] In a second embodiment of the invention space-time coding techniques, originally proposed for point-to-point 
communication links, are extended to point-to-multipoint communication links. The multiple access technique in the 
design of the transmission scheme is thereby taken into account. Each user's data symbol sequence is converted into 
blocks. The resulting block sequence is then linearly precoded as to add robustness against frequency-selective fading. 
The precoded data are put on the various tones of a multi-carrier system. Next the blocks are demultiplexed into a 
number of parallel sequences. Each of the sequences is spread with the same user code sequence, being the multi- 
plication of the user specific orthogonal spreading code and the base-station specific scrambling code. In each of the 
parallel streams the different user chip block sequences are added up together with the pilot chip block sequence. 
Then a block space-time encoding operation takes place. The space-time coding is implemented on each tone sepa- 
rately at the base station. In stead of a block space-time encoding also a trellis space-time encoding scheme may be 
envisaged. The usual multicarrier operations of inverse fast Fourier transforming and adding a cyclic prefix then follow 
in each stream before transmitting the signal. The cyclic prefix represents in this case said transmit redundancy. The 
mobile station of interest at the receiver side is equipped with multiple receive antennas. The operations corresponding 
to those at the transmitter side are performed on each of the received signals, starting with the cyclic prefix removal 
and the FFT operation. The space-time block decoding operation is performed. The space-time decoded output is next 
re-ordered on a tone-per-tone base. Next per-tone chip equalization is applied. The filter coefficients can be determined 
in a training based or in a semi-blind way. In the training-based approach one relies on the knowledge of a pilot symbol. 
The equalizer coefficients are determined such that the equalized output after despreading is as close as possible to 
a version of the pilot symbol, being the pilot symbol itself, put in per tone ordering. In the semi-blind approach one 
relies not only on the knowledge of said pilot symbol, but also on characteristics of the codes. The equalizer filter output 
is projected on the orthogonal complement on the subspace spanned by the composite codes of the various users (i. 
e. the codes resulting from the multiplication of the base station specific scrambling code and the user specific codes). 
This projected output must then be as close as possible to the pilot symbol spread with a composite code of the base 
station specific scrambling code and the pilot code, and put in per tone ordering. From the equalizer output the con- 
tribution of that specific user can easily be derived after removing the additional robustness against deep fading. 
[0057] In a third embodiment the base station again has multiple transmit antennas, but data is transmitted on a 
single carrier. The data symbol sequence is demultiplexed into several streams, which subsequently are converted 
into symbol blocks that are spread with the same user composite code sequence being the multiplication of the user 
specific orthogonal spreading code and the base station specific scrambling code. The pilot chip block sequence is 
added to the different user chip block sequences. The chip blocks output by said encoder are padded with a zero 
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postfix, parallel-to-serial converted and sent to the transmit antenna. Said postfix provides transmit redundancy. The 
receiver is again equipped with multiple receive antennas. Suppose the mobile station of interest has acquired perfect 
synchronization. After conversion into chip blocks the data on each receive antenna are block space-time decoded. 
Each decoded block is transformed into the frequency domain. The per receive antenna ordering is transformed into 
5 a per tone ordering. Then again a per tone chip level equalization can be performed. After re-transforming to the time 
domain and removal of the zero postfix, despreading finally yields an estimation of the desired user's transmitted data 
symbols. 

[0058] In a fourth embodiment the base station has a single transmit antenna, whereas the mobile station of interest 
may have multiple receive antennas. Each user's data symbol sequence is converted into symbol blocks and spread 

10 with the user composite code sequence, being the multiplication of a user specific and a base station specific scrambling 
code. The pilot symbol is treated in the same way. The different user chip block sequences and the pilot chip block 
sequence are added. At the end of each block a number of zeros are padded. Also a known symbol postfix, the length 
of which is the same as the number of zeros padded, is added to each block. After P/S conversion the chip sequence 
is transmitted. At the receiver side the mobile station of interest is equipped with multiple antennas and has acquired 

is perfect synchronisation. Assuming the known symbol postfix is long enough, there is no interblock interference present 
in the received signal. After transformation into the frequency domain a per tone chip equalizer filter is foreseen for 
each antenna. By transforming back into the time domain and removing the known symbol postfix the desired user's 
symbols can be retrieved. 

[0059] Below various embodiments of the invention are described. The invention is not limited to these embodiments 
20 but only by the scope of the claims. 

A. Embodiment 

A.1 Transceiver design 



[0060] Given the asymmetric nature of broadband services requiring much higher data rates in downlink than in 
uplink direction, we focus on the downlink bottleneck of future broadband cellular systems. Our goal is to design a 
transceiver that can cope with the three main challenges of broadband cellular downlink communications. First, multi- 
path propagation gives rise to time dispersion and frequency-selective fading causing IS! and I CI, which limit the max- 

30 jmum data rate of a system without equalization. Second, multiple users trying to access common network resources 
may interfere with each other, resulting in MUI, which upperbounds the maximum user capacity in a cellular system. 
Specific to DS-CDMA downlink transmission, the MUI is essentially caused by multi-path propagation, since it destroys 
the orthogonality of the user signals. Third, cost, size and power consumption issues put severe constraints on the 
receiver complexity at the mobile. 

35 [0061] Throughout the text, we consider a single cell of a cellular system with a Base Station (BS) serving M active 
Mobile Stations (MSs) within its coverage area. For now, we limit ourselves to the single-antenna case and defer the 
multi-antenna case to later. 

A.1.a Multi-carrier block-spread CDMA transmission. 

40 

[0062] The block diagram in Fig. 4 describes the Multi-Carrier Block-Spread (MCBS) CDMA downlink transmission 
scheme (where only the m-th user is explicitly shown), that transforms the M user data symbol sequences 



into the multi-user chip sequence u[n] with a rate 1/T C . Apart from the user multiplexing and the IFFT, the transmission 
scheme performs three major operations, namely linear precoding, block spreading, and adding transmit redundancy. 
so Since our scheme belongs to the general class of block transmission schemes, the m-th user's data symbol sequence 
s^i] is first serial-to-parallel converted into blocks of B symbols, leading to the symbol block sequence 



25 



45 




55 



s-\i]:=[ S m [iB\.. .,s«[(i+l)B-ltf . 



The blocks s m [i] are linearly precoded by a QxB matrix 8 to yield the Oxl precoded symbol blocks: 
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s w [/]:=0s m [/] (1) 

where the linear precoding can be either redundant (Q>B) or non-redundant (Q=B). For conciseness, we limit our 
discussion to redundant precoding, but the proposed concepts apply equally well to non-redundant precoding. As we 
will show later, linear precoding guarantees symbol detection and maximum frequency-diversity gains, and thus ro- 
bustifies the transmission against frequency-selective fading. Unlike the traditional approach of symbol spreading that 
operates on a single symbol, we apply here block spreading that operates on a block of symbols. Specifically, the block 
sequence s " [/] is spread by a factor N with the user composite code sequence (^[n], which is the multiplication of a 
short orthogonal Walsh-Hadamard spreading code that is MS specific and a long overlay scrambling code that is BS 
specific. The chip block sequences of the different active users are added, resulting into the multi-user chip block 
sequence : 

X[»] = Z*W (2) 

m=>l 

where the chip block index n is related to the symbol block index / by: n=/A/+n', n'e{0 AM}. As will become apparent 

later, block spreading enables MUl-resilient reception, and thus effectively deals with the MUI. Subsequently, the Ox O 
IFFT matrix F^ transforms the Frequency-Domain (FD) chip block sequence * [n] into the Time-Domain (TD) chip block 
sequence: 

x[h] = F£.x[/i]. 

The KxO transmit matrix T, with KzQ adds some redundancy to the chip blocks x[n]: u[n]:=T-x[f)]. As will be clarified 
later, this transmit redundancy copes with the time-dispersive effect of multi-path propagation and also enables low- 
complexity equalization at the receiver. Finally, the resulting transmitted chip block sequence u[n] is parallel-to-serial 
converted into the corresponding scalar sequence 

[u[nK],..., U Kr> + l)K~l]f := a [n] 

and transmitted over the air at a rate ^1-. 

c 

A.1.b Channel model. 

[0063] Adopting a discrete-time baseband equivalent model, the chip-sampled received signal is a channel-distorted 
version of the transmitted signal, and can be written as: 



v[»] = £ h[l]u[n -/] + Mini (3) 

where h[l\ is the chip-sampled FIR channel that models the frequency-selective multi-path propagation between the 
transmitter and the receiver including the effect of transmit and receive filters, L c is the order of h{l\ t and w{n] denotes 
the additive gaussian noise, which we assume to be white with variance a 2 . Furthermore, we define L as a known 
upperbound on the channel order: L>L a which can be well approximated by 

L ~ L J+1 , 
' c 

where x max is the maximum delay spread within the given propagation environment. 
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A.1.C MUl-resilient reception. 

[0064] The block diagram in Fig.5 describes the reception scheme for the MS of interest (which we assume to be 
the m-th one), which transforms the received sequence v{n] into an estimate of the desired user's data symbol se- 
quence S u [/]. Assuming perfect synchronization, the received sequence v(n] is serial-to-parallel converted into its cor- 
responding block sequence 



v[n):=[v[riKl...rf(n + l)K-l}] T . 
From the scalar input/output relationship in (3), we can derive the corresponding block input/output relationship : 

v[n]=H[0]u[n]+H[1]u[/7-1]+w[n] f (4) 

where 



wM:=[«,.,*+l)X-l]f 

is the noise block sequence, H[0] is a Kx K lower triangular Toeplitz matrix with entries 

[H[0]] M = *[/>-?], 
and H[1] is a KxK upper triangular Toeplitz matrix with entries 



The time-dispersive nature of multi-path propagation gives rise to so-called Inter-Block Interference (IBI) between 
successive blocks, which is modeled by the second term in (4). The Ox K receive matrix R again removes the redun- 
dancy from the blocks v[n]: y[n]:=R-v[r?]. The purpose of the transmit/receive pair (T,R) is twofold. First, it allows for 
simple block by block processing by removing the IBI. Second, it enables low-complexity frequency-domain equaliza- 
tion by making the linear channel convolution to appear circulant to the received block. To guarantee perfect IBI removal, 
the pair (T,R) should satisfy the following condition : 

RH[1]T=0. (5) 

[0065] To enable circulant channel convolution, the resulting channel matrix H:=R-H[0]T should be circulant. In this 
way, we obtain a simplified block input/output relationship in the TD : 

y[n]=H-x[n]+z[n], (6) 

where z[n]:=Rw[n] is the corresponding noise block sequence. In general, two options for the pair (T,R) exist that satify 
the above conditions. The first option corresponds to Cyclic Prefixing (CP) in classical OFDM systems, and boils down 
to choosing K=Ch L, and selecting: 



T = V=K„I r e ] r , R=R„:=[<W,I fi ], (7) 
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where l cp consists of the last L rows of l a The circulant property is enforced at the transmitter by adding a cyclic prefix 
of length L to each block. Indeed, premultiplying a vector with copies its last L entries and pastes them to its top. 
The IBI is removed at the receiver by discarding the cyclic prefix of each received block. Indeed, premultiplying a vector 
with deletes its first L entries and thus satisfies (5). 

[0066] The second option corresponds to Zero Padding (ZP), and boils down to setting K=0+Z., and selecting: 

T = T ip :=[l^0^] r , R = R v :=[l B ,I„], (8) 

where is formed by the first L columns of Iq. Unlike classical OFDM systems, here the IBI is entirely dealt with at 
the transmitter. Indeed, premultiplying a vector with T zp pads L trailing zeros to its bottom, and thus satisfies (5). The 
circulant property is enforced at the receiver by time-aliasing each received block. Indeed, premultiplying a vector with 

adds its last L entries to its first L entries. 
[0067] Referring back to (6), circulant matrices possess a nice property that enables simple per-tone equalization in 
the frequency-domain. 

Property 1 : Circulant matrices can be diagonal ized by FFT operations: 
[0068] 



H = F£.H-F e , (9) 

with 

H:=diag(h), 



h-t^ 0 ),//^),...,//^' 0 )] 

the FD channel response evaluated on the FFT grid, 

H(z):=zto*[/]z 1 
the z-transform of h[H, and Fq the QxQ FFT matrix. 

[0069] Aiming at low-complexity FD processing, we transform y[nj into the FD by defining 

y[*.]:=Vy[H] • 

Relying on Property 1 , this leads to the following FD block input/output relationship: 

y[«] = fi-x[«] + z[»] (10) 

where 

z[w]:=F e .z[*] 

is the corresponding FD noise block sequence. Stacking N consecutive chip blocks? [n] into 
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Y[/]:=[y[W],...,y[0+l)^-l]], 



we obtain the symbol block level equivalent ot (10): 



Y[,] = HX[«]+Z[z], 



(ID 



where x [/] and £ {/] are similarly defined as y [/). From (2) we also have that : 



X[0 = Es m W-c w [zf, 



(12) 



where 



CM := [c m [iNl...,c m [(i+l)N~l]] T 



is the m-th user's composite code vector used to block spread its data symbol block §" [/]. By inspecting (11) and (12), 
we can conclude that our transceiver preserves the orthogonality among users, even after propagation through a 
(possibly unknown) frequency-selective multi-path channel. This property allows for deterministic MUI elimination 
through low-complexity code-matched filtering. Indeed, by block despreading (11) with the desired user's composite 
code vector c^i] (we assume the m-th user to be the desired one), we obtain : 



is the corresponding noise block sequence. Our transceiver succesfully converts (through block despreading) a multi- 
user chip block equalization problem into an equivalent single-user symbol block equalization problem. Moreover, the 
operation of block despreading preserves Maximum-Likelihood (ML) optimality, since it does not incur any information 
loss regarding the desired user's symbol block s m [i\. 

A.1.d Single-user equalization. 

[0070] After succesf ull elimination of the MUI, we still need to detect the desired user's symbol block s^/] from (1 3). 
Ignoring for the moment the presence of © (or equivalents setting Q=B and selecting 0=l o ), this requires h to have 
full column rank O. Unfortunately, this condition only holds for channels that do not invoke any zero diagonal entries 
in h . In other words, if the MS experiences a deep channel fade on a particular tone (corresponding to a zero diagonal 
entry in n ), the information symbol on that tone can not be recovered. To guarantee symbol detectability of the B 
symbols in s m [/], regardless of the symbol constellation, we thus need to design the precodere such that: 



irrespective of the underlying channel realization. Since an FIR channel of order L can invoke at most L zero diagonal 
entries in h , this requires any Q-L=B rows of G to be linearly independent. Two classes of precoders have been 
constructed that satisfy this condition and thus guarantee symbol detectability or equivalently enable full frequency- 



(13) 



where 



z m \i}:=ifi]c m U} 0 




(14) 
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diversity gain, namely the Vandermonde precoders and the cosine precoders. For instance, a special case of the 
general cosine precoder is a truncated Discrete Cosine Transform (DCT) matrix. 

A.2. Equalization options 

5 

[0071] In this section, we discuss different options to perform equalization and decoding of the linear preceding, 
either jointly or separately. These options allow to trade-off performance versus complexity, ranging from optimal Max- 
imum-Likelihood (ML) detection with exponential complexity to linear and decision-directed detection with linear com- 
plexity. To evaluate the complexity, we distinguish between the initialization phase, where the equalizers are calculated, 
io and the data processing phase, where the actual equalization takes place. The rate of the former is related to the 
channel's fading rate, whereas the latter is executed continuously at the symbol block rate. 

A.2.a ML detection 

is [0072] The ML algorithm is optimal in a Maximum Likelihood sense, but has a very high complexity. The likelihood 
function of the received block jr [/], conditioned on the transmitted block s^/], is given by : 



20 



p(rmis"[/]) = -^roexp 



(15) 



Amongst all possible transmitted blocks, the ML algorithm retains the one that maximizes the likelihood function or, 
25 equivalently, minimizes the Euclidean distance: 



!"[»] = arg min ||y"[i] - H • 0 • s"[/|. (16) 

30 

[0073] In other words, the ML metric is given by the Euclidean distance between the actual received block and the 
block that would have been received if a particular symbol block had been transmitted in a noiseless environment. The 
number of possible transmit vectors in Sis the cardinality of S, i.e. |S]=M B , with M the constellation size. So, the number 
of points to inspect during the data processing phase grows exponentially with the initial block length B. Hence, this 
35 algorithm is only feasible for a small block length B and a small constellation size M . Note that the ML algorithm does 
not require an initialization phase. 

A.2.b Joint Linear Equalization and Decoding. 

40 [0074] Linear equalizers that perform joint equalization and decoding combine a low complexity with medium per- 
formance. A first possiblity is to apply a Zero-Forcing (ZF) linear equalizer : 



45 



0^(0" H" H 0)" 0" fi", 



(17) 



50 



which completely eliminates the 131, irrespective of the noise level. By ignoring the noise, it causes excessive noise 
enhancement, especially at low SNR. A second possiblity is to apply a Minimum Mean-Square-Error (MMSE) linear 
equalizer : 



55 



* MMSE 



© ■ jj H-0+— 7l fl 
o-. 



(18) 



which minimizes the MSE between the actual transmitted symbol block and its estimate. The MMSE linear equalizer 
explicitly takes into account the noise variance a 2 and the information symbol variance a 2 , and balances ISI elimination 
with noise enhancement. From (17) and (18), itls also clear thatG MWSE reduces to G Z pat high SNR. 
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[0075] During the initialization phase, G ZF and G MMS£ can be computed from the multiple sets of linear equations, 
implicitly shown in (1 7) and (1 8), respectively. The solution can be found from Gaussian elimination with partial pivoting, 
based on the LU decomposition, leading to an overall complexity of OfQS 2 ). During the data processing phase, the 
equalizers G ZF and G MAfS£ are applied to the received block y" [/], leading to a complexity of 0(GB). 

A.2.C Joint Decision Feedback Equalization and Decoding. 

[0076] On the one hand, the ML algorithm of Subsection A.2.a achieves the optimal performance but with a very 
high complexity. On the other hand, the linear equalizers of Subsection A.2.b offer a low complexity but at a relatively 
poor performance. The class of non-linear equalizers that perform joint decision feedback equalization and decoding 
lie in between the former categories, both in terms of performance and complexity. Decision feedback equalizers exploit 
the finite alphabet property of the information symbols to improve performance relative to linear equalizers. They consist 
of a feedforward section, represented by the matrix W, and a feedback section, represented by the matrix B : 

s m [i] - slice [ W • y m [i] - B - §"[/]] . (19) 

[0077] The feedforward and feedback section can be designed according to a ZF or MMSE criterium. In either case, 
B should be a strictly upper or lower triangular matrix with zero diagonal entries, in order to feedback decisions in a 
causal way. To design the decision feedback counterpart of the ZF linear equalizer, we compute the Cholesky decom- 
position of the matrix 



0" H" H 0 

in (17): 

0" ■H /, -H-© = (Z 1 .U 1 ) ff -S | .U p (20) 

where U t is an upper triangular matrix with ones along the diagonal, and S 1 is a diagonal matrix with real entries. The 
ZF feedforward and feedback matrices then follow from : 

W ZP =U I -G ZF =E; , -(Uf-S 1 )-'.0 w -H'' J B^U.-I,. (21) 

[0078] The linear feedforward section W ZF suppresses the ISI originating from "future" symbols, the so-called pre- 
cursor ISI, whereas the non-linear feedback section B ZF eliminates the ISI originating from "past" symbols, the so- 
called post-cursor ISI. 

[0079] Likewise, to design the decision feedback counterpart of the MMSE linear equalizer, we compute the Cholesky 
decomposition of the matrix 

in (18): 

B H -U* •A-e+&I 3 -fa*V 2 f-^-V„ (22) 



where U 2 is an upper triangular matrix with ones along the diagonal, and Eg is a diagonal matrix with real entries. The 
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MMSE feedforward and feedback matrices can then be calculated as : 



W M « SE =U 2 G^ S£ =S;' (U 2 w -E 2 )"'•©" -h", IW=U 2 -I a . (23) 

[0080] During the initialization phase, the feedforward and feedback filters are computed based on a Cholesky de- 
composition, leading to an overall complexity of 0(QEP). During the data processing phase, the feedforward and feed- 
back filters are applied to the received data according to (1 9), leading to a complexity of 0(QB). Note that the decision 
feedback equalizers involve the same order of complexity as their linear counterparts. 

A.2.d Separate Linear Equalization and Decoding. 

[0081] Previously, we have only considered joint equalization and decoding of the linear precoding. However, in order 
to even further reduce the complexity with respect to the linear equalizers of Subsection A.2.b, equalization and de- 
coding can be performed separately as well : 



s ffl [/] = 0"*G.y m [a (24) 

where g performs linear equalization only and tries to restore s" [/], and 6 H subsequently performs linear decoding only 
and tries to restore s^/]. 

[0082] The ZF equalizer perfectly removes the amplitude and phase distortion: 



G 2F ={n"-^y-a", (25) 

but also causes excessive noise enhancement, especially on those tones that experience a deep channel fade. Since 
h is a diagonal matrix, the ZF equalizer decouples into Q parallel single-tap equalizers, acting on a per-tone basis in 
the FD. The MMSE equalizer balances amplitude and phase distortion with noise enhancement and can be expressed 
as : 



G^^H'-H + o-XT-H*, (26) 

where 



R, :=E{sli]-s"[i-f} = of®-®" 
is the covariance matrix of s" 1 [/]. If we neglect the color in the precoded symbols 

the MMSE equalizer also decouples into O parallel and independent single-tap equalizers. 

[0083] During the initialization phase, and are calculated from (25) and (26), respectively, where the matrix 
inversion reduces to Q parallel scalar divisions, leading to an overall complexity of 0(0). During the data processing 
phase, the received data is separately equalized and decoded, leading to an overall complexity of 0(GB). 

A.3 Extension to multiple antennas. 

[0084] As showed in Sections A.1 and A.2, MCBS-CDMA successfully addresses the challenges of broadband cel- 
lular downlink communications. However, the spectral efficiency of single-antenna MCBS-CDMA is still limited by the 
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received signal-to-noise ratio and cannot be further improved by traditional communication techniques. As opposed 
to single-antenna systems, Multiple-Input Multiple-Output (Ml MO) systems that deploy N T transmit and N R receive 
antennas, enable a A/ m/n -fold capacity increase in rich scattering environments, where N min = m\r\{N T ,N R ] is called the 
multiplexing gain. Besides the time, frequency and code dimensions, MIMO systems create an extra spatial dimension 

5 that allows to increase the spectral efficiency and/or to improve the performance. On the one hand, Space Division 
Multiplexing (SDM) techniques achieve high spectral efficiency by exploiting the spatial multiplexing gain. On the other 
hand, Space-Time Coding (STC) techniques achieve high Quality-of-Service (QoS) by exploiting diversity and coding 
gains. Besides the leverages they offer, MIMO systems also sharpen the challenges of broadband cellular downlink 
communications. First, time dispersion and ISI are now caused by NjN R frequency-selective multi-path fading channels 

10 instead of just 1 . Second, MUI originates from NjM sources instead of just M. Third, the presence of multiple antennas 
seriously impairs a low-complexity implementation of the MS. To tackle these challenges, we will demonstrate the 
synergy between our MCBS-CDMA waveform and MIMO signal processing. In particular, we focus on a space-time 
block coded MCBS-CDMA transmission, but the general principles apply equally well to a space-time trellis coded or 
a space division multiplexed MCBS-CDMA transmission. 

15 

A. 3. a Space-time block coded MCBS-CDMA transmission. 

[0085] The block diagram in Fig. 6 describes the Space-Time Block Coded (STBC) MCBS-CDMA downlink trans- 
mission scheme (where only the m-th user is explicitly shown), that transforms the M user data symbol sequences 

20 

25 into N T ST coded multi-user chip sequences 

30 

with a rate For conciseness, we limit ourselves to the case of N T = 2 transmit antennas. As for the single-antenna 
case, the information symbols are first grouped into blocks of B symbols and linearly precoded. Unlike the traditional 
approach of performing ST encoding at the scalar symbol level, we perform ST encoding at the symbol block level. 
Out ST encoder operates in the FD and takes two consecutive symbol blocks 

35 

{r[2/ir[2i+i]} 

to output the following 20x2 matrix of ST coded symbol blocks : 

40 

"sTPfl ITP#+I]l = r s m [2i] -s m [2/ + l] # j (2?) 
_S2[2/] s?[2/ + l]J [sl2/+l] s w [2i]* J" 

45 

[0086] At each time interval /, the ST coded symbol blocks s^t/] and s™[/] are forwarded to the first and the second 
transmit antenna, respectively. From (27), we can easily verify that the transmitted symbol block at time instant 2A-1 
from one antenna is the conjugate of the transmitted symbol block at time instant 2/ from the other antenna (with a 
so possible sign change). This corresponds to a per-tone implementation of the classical Alamouti scheme for frequency- 
flat fading channels. As we will show later, this property allows for deterministic transmit stream separation at the 
receiver. 

[0087] After ST encoding, the resulting symbol block sequences {s™^/]}^^ are block spread and code division 
multiplexed with those of the other users : 

55 
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x fl( M = Zs"['>>], n = W+n'. (28) 

Wl»l 

[0088] At this point, it is important to note that each of the N T parallel block sequences are block spread by the same 
composite code sequence (f\n) t guaranteeing an efficient utilization of the available code space. As will become 
apparent later, this property allows for deterministic user separation at every receive antenna. After IFFT transformation 
and the addition of some form of transmit redundancy : 



n % [»]«T.F*.fJiil (29) 
the corresponding scalar sequences 

(».[<:, 

are transmitted over the air at a rate J~. 

c 

A.3.B MUl-resilient MIMO reception. 

[0089] The block diagram in Fig. 7 describes the reception scheme for the MS of interest, wfyich transforms the 
different received sequences {v n jin]} N " n ^ into an estimate of the desired user's data sequence s^/]. After transmit 
redundancy removal and FFT transformation, we obtain the multi-antenna counterpart of (11) : 



where 

Yj0:=[yJwi-,yJ0>iW-i]] 

stacks N consecutive received chip blocks y., [ n] at the n f -th receive antenna,H % * [/] is the diagonal FD channel matrix 
from the n r th transmit to the n r -th receive antenna, and x* [/] and t, [/] are similarly defined as y* [/]. From (28) and (30), 
we can conclude that our transceiver retains the user orthogonality at each receive antenna, irrespective of the under- 
lying frequency-selective multi-path channels. Like in the single-antenna case, a low-complexity block despreading 
operation with the desired user's composite code vector c^/] deterministically removes the MUI at each receive 
antenna : 



[0090] Hence, our transceiver successfully converts (through block despreading) a multi-user MIMO detection prob- 
lem into an equivalent single-user MIMO equalization problem. 

A.3.c Single-user space-time decoding. 

[0091] After MUI elimination, the information blocks s^/] still need to be decoded from the received block despread 
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sequences {y m n j[i\} . Our ST decoder decomposes into three steps: an initial ST decoding step and a transmit 
stream separation slip for each receive antenna, and, finally, a receive antenna combining step. 
[0092J The initial ST decoding step considers two consecutive symbol blocks {y n [2/] and y 0 [2h-1]}, both satisfying 
the block input/output relationship of (31). By exploiting the ST code structure of (Z7), we arrive at : 



5C r [2i] = H„,, s r[2/] + H„,.2 • UP*] + z*[2/], ( 32 ) 



y J [2* + 1]* = -h;, • s ?[2i] + h; >2 • s?[2i] + ^[2/ + 1]*. ( 33 ) 

[0093] Combining (6) and (7) into a single block matrix form, we obtain : 







Hn,,! Hn r ,2 




' 5120 " 


+ 


' z?,[2z] " 










_rp/+i]. 




.zr r [2i+l]*. 



(34) 



where 



sT[2/] = s ffl [2z] 

and 

s 2 n [2i] = r[2/+l] 

follow from (27). From the structure of H nf in (34), we can deduce that our transceiver retains the orthogonality among 
transmit streams at each receive antenna for each tone separately, regardless of the underlying frequency-selective 
multi-path channels. A similar property was also encountered in the classical Alamouti scheme, but only for single- 
user frequency-flat fading multi-path channels. 

[0094] The transmit stream separation step relies on this property to deterministically remove the transmit stream 
interference through low-complexity linear processing. Let us define the OxO matrix D*with non-negative diagonal 
entries as: 



From (34), we can verify that the channel matrix H nr satisfies: 

where ® stands for Kronecker product. Based on H nr and 6* > we can construct a unitary matrix 
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Performing unitary combining on (34) (through U n ), collects the transmit antenna diversity at the n,-th receive antenna : 



Km 

y;j2i + l] 



* D./S12I] " 


+ 


" urn ' 


_D^-s-[2i+l] 







(35) 



where the resulting noise n , ™[jj>U |I *^^[/] is still white with variance a*. Since multiplying with a unitary matrix pre- 
serves ML optimality, we can" deduct from (35) that the symbol blocks §"[2/] and s"[2h-1] can be decoded separately 
in an optimal way. As a result, the different symbol blockss" [/] can be detected independently from : 



0'] = d,,- §10+ 



[0095] Stacking the blocks from the different receive antennas 
for the final receive antenna combining step, we obtain : 



(36) 







Di 




"inn" 








•510+ 















(37) 



no 



[0096] At this point, we have only collected the transmit antenna diversity at each receive antenna, but still need to 
collect the receive antenna diversity. Let us define the Ox Q matrix d with non-negative diagonal entries as: 



^ : ~ H*v* 'Hwl 

From (37), we can verify that: 

H" H = D 2 - 

Based on H and d , we can construct a tall unitary matrix 

tJ:=H D~ l , 
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which satisfies U w -0=l o and 

Xj h H = D . 

Gathering the receive antenna diversity through multiplying (11) with U H , we finally obtain : 

y n [i] := t" • y n [i] = D-0-s*[/] + z OT W, (38) 



where the resulting noise 



is still white with variance o 2 . Since the multiplication with a tall unitary matrix that does not remove information also 
preserves ML decoding optimality, the blocks s^/] can be optimally decoded from (38). Moreover, (38) has the same 
structure as its single-antenna counterpart in (13). Hence, the design of the linear precoderS in Subsection A.1 .d, and 
the different equalization options that we have discussed in Section A.2, can be applied here as well. 

A. 4 Conclusion. 

[0097] To cope with the challenges of broadband cellular downlink communications, we have designed a novel Mul- 
ti-Carrier (MC) CDMA transceiver that enables significant performance improvements compared to 3G cellular systems, 
yielding gains of up to 9 dB in full load situations. To this end, our so-called Multi-Carrier Block-Spread (MCBS) CDMA 
transceiver capitalizes on redundant block-spreading and linear precoding to preserve the orthogonality among users 
and to enable full multi-path diversity gains, regardless of the underlying multi-path channels. Different equalization 
options, ranging from linear to ML detection, strike the trade-off between performance and complexity. Specifically, the 
MMSE decision feedback equalizer realizes a 2.8 dB gain relative to its linear counterpart and performs within 1 .4 dB 
of the optimal ML detector. Finally, our transceiver demonstrates a rewarding synergy with multi-antenna techniques 
to increase the spectral efficiency and/or improve the link reliability over MIMO channels. Specifically, our STBC/ 
MCBS-CDMA transceiver retains the orthogonality among users as well as transmit streams to realize both multi- 
antenna and multi-path diversity gains of NjN^L^) for every user in the system, irrespective of the system load. 
Moreover, a low-complexity linear MMSE detector, that performs either joint or separate equalization and decoding, 
approaches the optimal ML performance (within 0.4 dB for a (2,2) system) and comes close to extracting the full diversity 
in reduced as well as full load settings. 

B. Embodiment 

B.1. MC-DS-CDMA downlink system model. 
B.1.a. Transmitter model. 

[0098] Let us consider the downlink of a single-cell space-time coded MC-DS-CDMA system with U active mobile 
stations. As depicted in Figure 8, at the base-station, which we suppose to have M t transmit antennas, a space-time 
coded MC-DS-CDMA transmission scheme transforms the different user symbol sequences 

{'■toll 

and the pilot symbol sequence sP[/] into M t time-domain space-time coded multi-user chip sequences 
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where u m Jn] is transmitted from the m r th transmit antenna. For simplicity reasons, we will assume in the following that 
the base-station has only Mf=2 transmit antennas. Note however that the proposed techniques can be extended to the 
more general case of Mf>2 transmit antennas when resorting to the generalized orthogonal designs. As shown in Figure 
8, each user's data symbol sequence s°[/] (similar for the pilot symbol sequence sP[i\) is serial-to-parallel converted 
into blocks of S symbols, leading to the symbol block sequence 

s"[i] :=[>[«?] ... s"[(i + \)B -iff . 

The symbol block sequence s°[/] is linearly preceded by a OxB matrix 6, with O the number of tones, to yield the 
precoded symbol block sequence 



s"[/]:=0s"[O. 



The precoded symbol block sequence ? [/] is demultiplexed into M t parallel sequences 

where M t is the number of transmit antennas. Each of the <Mh user's precoded symbol block sequences 

is spread by a factor A/with the same user code sequence cjn], which is the multiplication of the user specific orthogonal 
Walsh-Hadamard spreading code and the base-station specific scrambling code. For each of the M t parallel streams, 
the different user chip block sequences are added up together with the pilot chip block sequence, resulting into the m r 
th multi-user chip block sequence : 



= Z §;ti]c„[«]+s;[0^["] < 39 ) 

tf-1 

with £=i_ -J. The Space-Time (ST) encoder operates in the frequency-domain and takes the two multi-user chip blocks 

N 

to output the following 20x2 matrix of ST coded multi-user chip blocks : 

x.l>] x,[2«+l]] = rx l [»] -tin)] (40) 

[0099] At each time interval n, the ST coded multi-user chip blocks x t [n] and x 2 [n] are forwarded to the first respec- 
tively the second transmit antenna. From Equation 40, we can easily verify that the transmitted multi-user chip block 
at time instant 2m- 1 from one antenna is the conjugate of the transmitted multi-user chip block at time instant 2n from 
the other antenna. The QxQ IFFT matrix F H transforms the frequency-domain ST coded multi-user chip block se- 
quence x m Jn] into the time-domain ST coded mufti-user chip block sequence x m< [n]=F^-x m Jn]. The KxQ transmit 
matrix T, with K=Ch[i, adds a cyclic prefix of length u. to each block of the time-domain Slcoded multi-user chip block 
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sequence x m Jn] leading to the time-domain transmitted multi-user chip block sequence u m| [n]=Tx m Jn]. Finally, the 
time-domain transmitted multi-user chip block sequence u m Jn] is parallel-to-serial converted into K chips, obtaining 
the time-domain transmitted multi-user chip sequence 



[u mi [nK] ... u ini [(n + \)K-\}] T :=u mi [ n ]. 



B.1.b Receiver model. 



[0100] One assumes that the mobile station of interest is equipped with M r receive antennas and has acquired perfect 
synchronisation. As shown in Figure 9, at each receive antenna, the time-domain received chip sequence v m J[n] is 
serial-to-parallel converted into blocks of K chips, resulting into the time-domain received chip block sequence 



v„M ... v,J(»+l)tf-l]] r 



The Ox K receive matrix R discards the cyclic prefix of each block of the time-domain received chip block sequence 
v m ^n] leading to the time-domain received ST coded chip block sequence y m/ I/?]=R-v /7J/ [n]. By transforming the time- 
domain received ST coded chip block sequence y m J[n] into the frequency-domain y m/ [n]:=F 0 -y mf [n] with the QxQ FFT 
matrix Fq, assuming a sufficiently long cyclic prefix n>L, we obtain a simple input/ouput relationship in the frequency- 
domain : 



m rt m t 



x mt [n]+e,„ r [n] 



(41) 



where e m/ [n] is the frequency-domain received noise block sequence and iU* the OxO diagonal frequency-domain 
channel matrix having the frequency-domain channel response iU* as its main diagonal. Exploiting the structure of 
the ST code design in Equation 40, we can write for two consecutive chip blocks y m \2n] and y m [2n+i] the frequency- 
domain input/ouput relationship of Equation 41, resulting in Equation 42. Stacking the contributions of the M r receive 
antennas 



yW=[yf[«] - y T M ,[n}J t 

we obtain the following per receive antenna 



y;[2«+i]_ 



Hm P ,2 ~Hm,,l 





"x,[«r 


+ 


' e J2n] " 

















(42) 



frequency-domain data model : 



y[rt] = H-x|>] + e|>] 



(43) 



where the per receive antenna channel matrix h and the per receive antenna noise block e In] are similarly defined 
as the per receive antenna output block y [n]. Defining the receive permutation matrix P r respectively the transmit 
permutation matrix P t as follows : 
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(44) 



where P r permutes a per receive antenna ordering into a per-tone ordering and where P f conversely permutes a per- 
tone ordering into a per transmit antenna ordering, we obtain the following per-tone data model : 



y[«] = H.x'[rt] + £|>] 



(45) 



[0101] In this Equation, 



*["]=[>[[«] - 9 T (fn]J 



is the per-tone output block, x'[n] the per-tone input block and e[n] the per-tone noise block similarly defined as i [n]. 
The per-tone channel matrix H is a block diagonal matrix, given by : 



H:=P,HP,= 



Hi 



He 



(46) 



B.1.c Data model for burst processing. 

[0102] Assuming a burst length of M t B I symbols for each user, we can stack I N consecutive chip blocks y [n], 
defined in Equation 43, into 

Y:=[y[0] ... y[J?V-l]], 
leading to the following per receive antenna data model for burst processing : 



Y = H X + E 



(47) 



where the input matrix x and the noise matrix e are similarly defined as the output matrix y. By having a look at the 
definition of x in Equation 42 and by inspecting Equation 39, we can write x as follows : 



where the multi-user total data symbol matrix 



(48) 



S</ [Si »■ Su] 



stacks the total data symbol matrices of the different active users and the u-th user's total data symbol matrix 



S„:=[sr Sff 
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stacks the u-th user's data symbol matrices for the different transmit antennas. The u-th user's data symbol matrix for 
the m r th transmit antenna 



s;:=[syo] ... §;[/-i]] 



stacks / consecutive precoded symbol blocks for the i/-th user and the m r th transmit antenna. The total pilot symbol 
matrix s, and the pilot symbol matrix for the m r th transmit antennas; are similarly defined as s. respectively s; ■ The 
10 multi-user code matrix 



15 



20 



25 



30 



C„:=[Cf ... C*r 



stacks the code matrices of the different active users. The iMh user's code matrix stacks the u-th user's code vectors 
at / consecutive symbol instants : 



C := 



cJO] 



c.[/-l] 



(49) 



where 



c,[i] = [cJW] ... c u [(i + \)N-l]] 



is the o-th user's code vector used to spread the precoded symbol blocks 



35 



40 



45 



50 



55 



[0103] Similarly to the per receive antenna data model for burst processing in Equation 47, we can stack I N con- 
secutive chip blocks f [n] leading to the following per-tone data model for burst processing : 



y=H'X'+E 

Using Equation 44 and 48 we can express X' as follows : 



where 



and 



(50) 



(51) 
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are the per- tone permuted versions of s, respectively s, ■ 
B.2. Per-tone burst chip equalizers. 

[0104] Inspired by our related work for the DS-CDMA downlink, we can now deal with the design of the chip equal- 
izers. Starting from Equation 50 and assuming that the channel matrix H' has full column rank and the input matrix X 
has full row rank, it is possible to find a Zero-Forcing (ZF) chip equalizer matrix G, for which : 

Gy-X'=0 (52) 

provided there is no noise present in the output matrix y. Since the channel matrix H' has a block diagonal structure, 
as shown in Equation 46, the equalizer matrix G suffices to have a block diagonal structure as well : 



G:= 



G 



(53) 



acting on a per-tone basis. For this reason, the ZF problem of Equation 52 decouples into Q parallel and independent 
ZF problems, one for each tone. Using Equation 51 , we can rewrite the original ZF problem of Equation 52 as follows : 

G-Y-S rf -C,-s,-C p =0 (54) 

which is a ZF problem in both the equalizer matrix G and the multi-user total data symbol matrix • 
B.2.a Training-based burst chip equalizer. 

[01 05] The training-based chip equalizer determines its equalizer coefficients from the per-tone output matrix Y based 
on the knowledge of the pilot code matrix C p and the total pilot symbol matrix £, . By despreading Equation 54 with 
the pilot code matrix C p , we obtain : 

G-*-C?-S, = 0 (55) 

because of the orthogonality between the multi-user code matrix C d and the pilot code matrix Cp. In case noise is 
present in the output matrix y, we have to solve the corresponding Least Squares (LS) minimisation problem : 

G = argnun|G-t-C?-sX (56) 

which can be interpreted as follows. The equalized output matrix Gy is despread with the pilot code matrix Cp. The 
equalized output matrix after despreading G yC H should then be as close as possible to the known total pilot symbol 
matrix £, in a Least Squares sense. p 

B.2.b Semi-blind burst chip equalizer. 

[0106] The semi-blind chip equalizer determines its equalizer coefficients from the per-tone output matrix Y based 
on the knowledge of the multi-user code matrix the pilot code matrix C p and the total pilot symbol matrix • By 
solving Equation 54 first for assuming G to be known and fixed, gives 
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Substituting s, into Equation 54 leads to a semi-blind ZF problem in G only : 



G-t(l„-C?'Cj-S f .C,=0 (57) 
In case noise is present in the output matrix y, we have to solve the corresponding LS minimisation problem : 

G = argmin|G • Y (l w -C» C^-^cJ* (58) 

which can be interpreted as follows. The equalized output matrix Gy is projected on the orthogonal complement of 
the subspace spanned by the multi-user code matrix The equalized output matrix after projecting 

g.y(i w -c?-c,) 

should then be as close as possible to the known total pilot chip matrix 

in a Least Squares sense. 
B.2.c User-specific detection. 

A 

[01 07] The obtained per-tone pilot-trained chip equalizer matrix G, wether training-based or semi-blind, may subse- 
quently be used to extract the desired user's total data symbol matrix : 



S„ = 0 H -P,-G-Y< (59) 

where the equalized output matrix G y is first despread with the desired user's code matrix Cy. Next the transmit per- 
mutation matrix P t permutes the per-tone ordering of the despread equalized output matrix into a per transmit antenna 
ordering. Finally, the total precoding matrix e linearly decodes the permuted version of the despread equalized output 
matrix, where e is a M t -QxM t B block diagonal matrix with the precoding matrix 0 on its main diagonal. 
[0108] We can conclude that the per-tone pilot-trained chip equalizer with the training-based cost function is a prom- 
ising technique for downlink reception in future broadband wireless communication systems based on a space-time 
coded MC-DS-CDMA transmission scheme. 

C. Embodiment 

C.1. SCBT-DS-CDMA downlink system model. 

[0109] Let us consider the downlink of a single-cell space-time block coded SCBT-DS-CDMA system with U active 
mobile stations. The base-station is equipped with M t transmit (TX) antennas whereas the mobile station of interest is 
equipped with M r receive (RX) antennas. 

C.I. a. Transmitter model for the base station. 

[01 10] For simplicity reasons, we will assume in the following that the base station has only Mf=2 transmit antennas. 
Note however that the proposed techniques can be extended to the more general case of Mf>2 transmit antennas 
when resorting to the generalized orthogonal designs. As shown in Figure 10, each user's data symbol sequence s 0 
[/] (similar for the pilot symbol sequence sP[i\) is demultiplexed into M t parallel lower rate sequences 
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{s: i [i):=s»[iM, + m,-l)}* mi , 
5 where M t is the number of transmit antennas. Each of the u-th user's symbol sequences 

{<<., 

10 

is serial-to-parallel converted into blocks of B symbols, leading to the symbol block sequences 

{s;[/]:=[<[/5],...,<[(/+l)5-l]] r }^ 

that are subsequently spread by a factor N with the same user composite code sequence cjn] which is the multiplication 
of the user specific orthogonal Walsh-Hadamard spreading code and the base station specific scrambling code. For 
20 each of the M t parallel streams, the different user chip block sequences and the pilot chip block sequence are added, 
resulting into the m f \h multi-user chip block sequence : 



25 



n 

~N 



(60) 



Let us also define the u-th user's total symbol block sequence 

30 

■"[fl:=[trto r .«5[flT 



35 



and the total multi-user chip block sequence 



xt"]:=[x[M,x[[„]] r 



^0 The block Space-Time (ST) encoder operates in the time-domain (TD) at the chip block level rather than at the symbol 
block level and takes the two multi-user chip blocks 



45 



to output the following 2Sx2 matrix of ST coded multi-user chip blocks : 



50 



x,[2«] x,[2n-hl] 
x 2 [2n] x 2 [2" + l]J 



x 2 [«] 



p(0) 



(61) 



where P w is a Jx J permutation matrix implementing a reversed cyclic shift over j positions. At each time interval n, 
the ST cdcled multi-user chip blocks i ,[n] and x 2 [n] are forwarded to the first respectively the second transmit antenna. 
From Equation 61 , we can easily verify that the transmitted multi-user chip block at time instant 2n+1 from one antenna 
is the time-reversed conjugate of the transmitted multi-user chip block at time instant 2n from the other antenna (with 
a possible sign change). The Kx B transmit matrix T 2p , with K=B+yL t pads a zero postfix of length u, to each block of 
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the ST coded multi-user chip block sequence x m( [n] leading to the transmitted multi-user chip block sequence u m Jn] 
^T^-Xfljn]. Finally, the transmitted multi-user chip block sequence u m jnj is parallel-to-serial converted into the trans- 
mitted multi-user chip sequence 



[ Umi [nKl..., Uai [(« + \)K - l]] T := [»] . 



10 



C.1.b Receiver model for the mobile station. 

[0111] It is assumed that the mobile station of interest is equipped with M r receive antennas and has acquired perfect 
synchronisation. At each receive antenna in Figure 11 , the TD received chip sequence v m ^n] is seriai-to-parallel con- 
verted into blocks of K chips, resulting into the TD received chip block sequence 



15 



v m ,ln]:=[ Vmr [nKl..., Var K n + l)K-ll] T 



The KxK receive matrix R:=I K completely preserves each block of the TD received chip block sequence v m/ [n] leading 
20 to the TD received ST coded chip block sequence y m j[n] = R-v m/ [n]. Assuming a sufficiently long zero postfix u£l (L 
is the maximum channel order), we obtain a simple input/ouput relationship in the time-domain : 



25 



(62) 



where e m ^n] is the TD received noise block sequence and H mm is a KxKcirculant channel matrix. We consider two 
consecutive chip blocks and define y mr1 [n]:=y m j2n] and y mr2 [n]:=P (e) -y *[2m-1J. Transforming y mft1 [n] and y mri2 [n] 
30 to the frequency-domain (FD) employing the Kx K FFT matrix leacfs tolne input/output relationship of Equation 63 
on the top of the next page, where 



35 



*.m r .m, 



,m, K 



is the KxK diagonal FD channel matrix having the FD channel response as its main diagonal. Note from Equation 
63 that 



40 



x[/z]:=VT -x[/i] 



45 



where both the compound FFT matrix F K := diag {F K ,F K } and the compound transmit matrix T^ := diag {T^.T^ are 
block diagonal. Stacking the contributions of the M r receive antennas 



y[n]=[f t [nl...,yljLn]] T , 



50 



55 



F r .P^-y^2«+l] 



Hm rl I Hfn r ,2 



F x -e^[2n] 



(63) 



x{«] 



we obtain the following per-RX-antenna FD data model : 
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y[«] = Hxt«] + e[«] (64) 

where the per-RX-antenna channel matrix ft and the per-RX-antenna noise block e [n] are similarly defined as the 
per-RX -antenna output block y [n]. Defining the per-tone input block x'[n] and the per-tone output block i [n] as : 

x[n] := P, x[«] = [xi[n],...,xM] T 

^«]:=P r yW = [yM-.^Mr < 65 > 

where P t permutes a per-TX-antenna ordering into a per-tone ordering and where P r permutes a per-RX-antenna 
ordering into a per-tone ordering, we obtain the following per-tone data model : 

fl/2] = H-x'[n] + 6[n] (66) 

where e[n] is the per-tone noise block similarly defined as y [n]. The per-tone channel matrix H is a block diagonal 
matrix, given by : 

H := P r H P, r = diag {h„ ...,H*} (67) 
C.1.c Data model for burst processing. 

[0112] Assuming a burst length of M t B l symbols for each user, we can stack I N consecutive chip blocks y [n], 
defined in Equation 64, into 

Y:=[y[0],...,y[/7V-1]], 
leading to the following per-RX-antenna data model for burst processing : 

Y = H-X + E (68) 
where the input matrix x and the noise matrix t are similarly defined as the output matrix y . Note that 

X = F K % p X (69) 

where X stacks I N consecutive total multi-user chip blocks x[n]. Moreover, by inspecting Equation 60, we can write X 

as : 

X=S d -CVS p -C p (70) 

where the multi-user total data symbol matrix S^^^ ,...,S J stacks the total data symbol matrices of the different active 
users and the u-th user's total data symbol matrix 

S„ :=[s"[0],...,s"[/-l]] 
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stacks / consecutive total symbol blocks for the u-th user. The total pilot symbol matrix S p is similarly defined as 
The multi-user code matrix 



C d --[Cf j 

stacks the code matrices of the different active users. The tMh user's code matrix stacks the u-th user's composite 
code vectors at / consecutive symbol block instants : 

C u :=diag{cl[0] (71) 

where 



c„[i] :=[c B [/iV], ..,c.[(/+l)^-l]f 

is the u-th user's composite code vector used to spread the total symbol block $"[/]. The pilot code matrix C p is similarly 
defined as C u . 

[01 1 3] Similarly to the per-RX-antenna data model for burst processing in Equation 68, we can stack I N consecutive 
chip blocks y [n] leading to the following per-tone data model for burst processing : 

y=H'X'+E (72) 
[01 14] Using Equation 65, 69 and 70, we can express X' as : 

X" +$,-C, (73) 

where 

Srf := 'Sd 

and 

are the per-tone permuted versions of 
respectively 

C.2 Burst frequency-domain chip equalization 

[0115] Armed with a suitable data model for burst processing, we can now proceed with the design of different Least 
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Squares (LS) type of burst FD chip equalizers that processes a burst of Afy/data symbol blocks at once. Note that 
Recursive Least Squares (RLS) type of adaptive FD chip equalizers that process the data on a symbol block by symbol 
block basis can be easily derived from their corresponding LS burst version. Starting from Equation 72 and assuming 
the channel matrix H* to have full column rank and the input matrix X' to have full row rank, it is always possible to find 
a Zero-Forcing (ZF) chip equalizer matrix G\ for which G'y-X=0, provided there is no noise present in the output matrix 
y. In the presence of noise, we have to solve the corresponding Least Squares (LS) minimization problem, which we 
denote for convenience as : 



G -Y-X = 0 (74) 

[01 16J Since the channel matrix H' has a block diagonal structure, as shown in Equation 67, the equalizer matrix G* 
suffices to have a block diagonal structure as well : 

G t :=diag{G\,..., G\) (75) 

acting on a per-tone basis at the chip block level (see also Figure 6). For this reason, the LS problem of Equation 74 
decouples into K parallel and independent LS problems, one for each tone. Using Equation 73, we can rewrite the 
original LS problem of Equation 74 as : 

G'-*-S,-C rf -$,-C,=0 (76) 

which is a LS problem in both the equalizer matrix G' and the multi-user total data symbol matrix & • Starting from 
Equation 76, we will design in the following two different FD methods for direct chip equalizer estimation that differ in 
the amount of a-priori information they exploit to determine the equalizer coefficients. The first method, coined CDMP- 
trained, only exploits the presence of a Code Division Multiplexed Pilot (CDMP). The second method, coined semi- 
blind CDMP-trained, additionally exploits knowledge of the multi-user code correlation matrix. 

C.2.a CDMP-trained chip equalizer 

[0117] The CDMP-trained chip equalizer estimator directly determines the equalizer coefficients from the per-tone 
output matrix y based on the knowledge of the pilot code matrix C p and the total pilot symbol matrix s, • By despreading 
Equation 76 with the pilot code matrix C pt we obtain : 



G'-*-C?-§, = 0 (77) 

because C^C H =0 due to the orthogonality of the user and pilot composite code sequences at each symbol instant. 
Equation 77 cSr\ be interpreted as follows. The equalized per-tone output matrix after despreading G'-y C H should be 
as close as possible in a Least Squares sense to the per-tone pilot symbol matrix s, • p 

C.2.b Semi-blind CDMP-trained chip equalizer. 

[01 1 8] The semi-blind CDMP-trained chip equalizer estimator directly determines the equalizer coefficients from the 
per-tone output matrix y based on the knowledge of the multi-user code matrix C^ the pilot code matrix C p and the 
per-tone pilot symbol matrix s, . By despreading Equation 76 with the multi-user code matrix C d and by assuming the 
per-tone equalizer matrix G* to be known and fixed, we obtain an LS estimate of the per-tone multi-user data symbol 
matrix 
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Substituting §, into the original LS problem of Equation 76 leads to a modified LS problem in the per-tone equalizer 
matrix G' only : 



G' ■*'[!.,-€« 'C,)-6,.C,"o (78) 



which can be interpreted as follows. The equalized per-tone output matrix G'y is first projected on the orthogonal 
complement of the subspace spanned by the multi-user code matrix employing the projection matrix l //sr C 0 & 
The resulting equalized per-tone output matrix after projecting should then be as close as possible in Least Squares 
sense to the per-tone pilot chip matrix^, C p . 

C.2.c User-specific detection 

A 

[0119] As shown in Figure 6, the obtained per-tone chip equalizer matrix G\ whether CDMP-trained or semi-blind 
CDMP-trained, may subsequently be used to extract the desired user's total data symbol matrix : 



S„ = T£-F£.P, r .G' (79) 
where the estimated FD input matrix 

X = P, r G 

is transformed to the TD by the compound IFFT matrix and has #s zero postfix removed by the transpose of the 
ZP transmit matrix J zp . The resulting estimate of the TD input matrix X is finall^despread with the desired user's code 
matrix C 0 to obtain an estimate of the desired user's total data symbol matrix S^. 

C. 3 Conclusion 

[0120] In this section, we have combined Single-Carrier Block Transmission (SCBT) DS-CDMA with Time Reversal 
(TR) Space-Time Block Coding (STBC) for downlink multi-user MIMO communications. Moreover, we have developed 
two new direct equalizer estimation methods that act on a per-tone basis in the Frequency-Domain (FD) exploiting a 
Code Division Multiplexed Pilot (CDMP). The regular CDMP-trained method only exploits the presence of a CDMP 
whereas the semi-blind CDMP-trained method additionally capitalizes on the unused spreading codes in a practical 
CDMA system. Both the regular and the semi-blind CDMP-trained method come close to extracting the full diversity 
of order Afff M,-((.+1) independently of the system load. The semi-blind CDMP-trained method outperforms the regular 
CDMP-trained method for low to medium system load and proves its usefulness especially for small burst lengths. 

D. Embodiment 

D.1 SCBT-PS-CDMA downlink system model 

[0121 ] Let us consider the downlink of a single-cell SCBT-DS-CDMA system with U active mobile stations. The base 
station has a single transmit antenna whereas the mobile station of interest has possibly multiple receive antennas. 

D.1.a Transmitter model for the base station 

[0122] As shown in Figure 12, the base station transforms U user data symbol sequences 

and a pilot symbol sequence sp[/] into a single transmitted chip sequence U[n]. Each user's data symbol sequence s" 
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[/] (pilot symbol sequence sP[i\) is serial-to-parallel converted into blocks of B symbols, leading to the data symbol block 
sequence 

•'W:-[j"[^l....J"[« + l)*-l]] r 

(pilot symbol block sequence sty]). The u-th user's data symbol block sequence s°[/] (pilot symbol block sequence sP 
[/]) is subsequently spread by a factor N with the user composite code sequence cjn] (pilot composite code sequence 
Cpjn]) which is the multiplication of a user specific (pilot specific) orthogonal Walsh-Hadamard spreading code and a 
base station specific scrambling code. The different user chip block sequences and the pilot chip block sequence are 
added, resulting into the multi-user chip block sequence : 



<")=i»'[*.W+s'[W '=[^J (80) 

The Bxl multi-user chip block sequence x[n] is transformed into the Kxl transmitted chip block sequence : 

u[n]=T,x[n]+T 2 b (81) 
with K=B+ u, where is the Kx B zero padding (ZP) transmit matrix 



T,:=[l a <W] r 



and T 2 is the Kx\i Known Symbol Padding (KSP) transmit matrix 

Note that this operation adds a nxl known symbol postfix b to each block of the multi-user chip block sequence x[n]. 
Finally, the transmitted chip block sequence u[n] is parallel-to-serial converted into the corresponding transmitted chip 
sequence 



[«[«^],...,«[(«+l)J5:-l]] r :=ut/i] . 

P.1.b Receiver model for the mobile station 

[0123] It is assumed that the mobile station of interest is equipped with M r receive antennas and has acquired perfect 
synchronisation. As shown in Figure 13, the mobile station of interest transforms M r received chip sequences 

Kt»C, 

into an estimate of the desired user's data symbol sequence s^/] (we assume the u-th user to be the desired user). At 
each receive antenna, the received chip sequence v m Jin] is serial-to-parallel converted into blocks of K chips, resulting 
into the received chip block sequence 
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The KxK receive matrix R:=I K completely preserves each block of the received chip block sequence v m Jn] leading to 
the received multi-user chip block sequence y m/ {n]:=R-v m/ [n]. Assuming a sufficiently long known symbol postfix u^L 
(L is the maximum channel order of all channels), we obtain a simple input/output relationship in the time-domain : 

V m ,ln]=H m/ (T 1 x[n] + T 2 -b) + z m ;n] (82) 

where z m j[n] is the received noise block sequence and H mr is a KxKcirculant channel matrix describing the multi-path 
propagation from the base station's transmit antenna to the mobile station's m r -th receive antenna. Note that there is 
no Inter Block Interference (IBl) because b acts as a cyclic prefix for each transmitted chip block u[n]. Transforming 
the time-domain (TD) received chip block sequence y m/ [n] into the corresponding frequency-domain (FD) received 
chip block sequence 



with F K the KxKFFT matrix, leads to the following FD input/output relationship : 

yJ«] = H m /x[«l + zJ«] (83) 

where 

z m ,[n]:=T K -z ar [n] 

is the FD received noise block sequence, 

x[n]:=F,-(T,.xM+T 2 -b) 
is the FD transmitted chip block sequence and 

is the Kx/Cdiagonal FD channel matrix having the FD channel response h«, for the nyth receive antenna as its main 
diagonal. Stacking the FD received chip block sequences of the M r receive antennas, we finally obtain the following 
FD data model : 







Hi 
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D.1.c Data model for burst processing 

[0124] Assuming a burst length of /Bdata symbols for each user, we can stack IN consecutive FD received chip 
blocks y [n], defined in Equation 84, into a FD output matrix 

5 

Y:=[y[01-..,y[JW-l]] # 
leading to the following FD data model for burst processing : 

10 

Y = H X + Z (85) 

where the FD input matrix x and the FD noise matrix z are similarly defined as the FD output matrix Y . Note from 
is Equation 83 that : 



X = F J ,-(T r X + T 2 -B) (86) 

20 

where the TD input matrix X stacks hN consecutive multi-user chip blocks x[n] and the KSP matrix B repeats I N times 
the known symbol postfix b. By inspecting Equation 80, we can also write X as follows : 

25 X=S„C d+ S p C p (87) 

where the multi-user data symbol matrix 



stacks the data symbol matrices of the different active users and the u-th user's data symbol matrix 

35 

S„ :=[s"[01...,s'[/-l]] 

stacks I consecutive data symbol blocks of the iMh user. The pilot symbol matrix S p is similarly defined as Sy. The 
40 multi-user code matrix 



45 

stacks the code matrices of the different active users. The u-th user's code matrix 

C B :=diag{c„[01...,c 11 [/-l]} 

50 

stacks the u-th user's composite code vectors at / consecutive symbol instants, and the iMh user's composite code 
vector 



c a [i}:=[c u [m-,c„Ki+l)N-l]] 
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is used to spread the data symbol block s^f]. The pilot code matrix C p and the pilot composite code vector c^i] are 
similarly defined as C 0 respectively c J/]. 

D.2 Burst frequency-domain chip equalization 

[0125] Armed with a suitable data model for burst processing, we can now proceed with the design of different Least 
Squares (LS) type of burst frequency-domain (FD) chip equalizers that process a burst of / data symbol blocks at once. 
Note that Recursive Least Squares (RLS) type of adaptive FD chip equalizers that process the data on a symbol block 
by symbol block basis can be easily derived from their corresponding LS burst version. Starting from Equation 85 and 
assuming the total FD channel matrix h to have full column rank and the FD input matrix x to have full row rank, it is 
always possible to find a Zero-Forcing (2F) FD chip equalizer matrix c , for which 

G- Y-X = 0 , 

provided there is no noise present in the FD output matrix y . In the presence of noise, we have to solve the corre- 
sponding Least Squares (LS) minimization problem, which we denote for convenience as: 

G-Y-X = 0 (88) 

[0126] Since the total FD channel matrix h stacks M r diagonal FD channel matrices, as indicated by Equations 83 
and 84, the total FD equalizer matrix g suffices to have a similar structure : 



G:=[g, - G«J (89) 



where the FD equalizer matrix for the m f -th receive antenna 

acts on a per-tone basis at the chip block level (see also Figure 10). Using Equations 86 and 87, we can rewrite the 
original LS problem of Equation 88 as : 

where the FD multi-user data symbol matrix ^ and the FD pilot symbol matrix s, are defined as : 

S„:=F ie -T l .S,s,:=F iC .T l .S, (91) 

[0127] Starting from Equation 90, we will design in the following three different FD methods for direct chip equalizer 
estimation that differ in the amount of a-priori information they exploit to determine the equalizer coefficients. The first 
method, coined KSP-trained, only exploits the presence of a known symbol postfix. The last two methods, coined joint 
CDMP/KSP-trained and semi-blind joint CDMP/KSP-trained, exploit the presence of both a known symbol postfix and 
a Code Division Multiplexed Pilot (CDMP). 

D.2.a KSP-trained chip equalizer 

[0128] The KSP-trained chip equalizer estimator directly determines the equalizer coefficients from the FD output 
matrix y based on the knowledge of the KSP matrix B. By transforming Equation 90 to the TD with the IFFT matrix 
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and by selecting the known symbol postfix with the KSP transmit matrix T 2 , we obtain : 



LS 

T 2 r -F£-G-Y-B = 0 (92) 



10 



15 



20 



because T^T^O^e and T^^sl^. Using the definition of T 2 in Equation 81, we can rewrite Equation 92 as : 

F£(£ + l:/r,:)<G-Y-B = 0 (93) 
which can be interpreted as follows. The equalized FD output matrix 

G Y 

is transformed to the TD with the last \i rows of the IFFT matrix F*. The resulting matrix should be as close as possible 
to the KSP matrix B in a Least Squares sense. 

D.2.B Joint CDMP/KSP-trained chip equalizer 



[0129] The joint CDMP/KSP-trained chip equalizer estimator directly determines the equalizer coefficients from the 
FD output matrix y based on the knowledge of the pilot code matrix C pf the pilot symbol matrix S p and the KSP matrix 
25 B. By despreading Equation 90 with the pilot code matrix Op, we obtain : 



k LS 



30 



G.Y.C^-F JC {T I .S p +T 2 .B.C^) = 0 (94) 

because C^C H =0 due to the orthogonality of the user and the pilot composite code sequences at each symbol block 
instant. Equation 94 can be interpreted as follows. The equalized FD output matrix after despreading 



40 



gyc; 

should be as close as possible in a Least Squares sense to the FD version of the pilot symbol matrix S p padded with 
the KSP matrix after despreading B-C*. 

D.2.c Semi-blind joint CDMP/KSP-trained chip equalizer 



[0130] The semi-blind joint CDMP/KSP-trained chip equalizer estimator directly determines the equalizer coefficients 
from the FD output matrix y based on the knowledge of the multi-user code matrix C d , the pilot code matrix Cp, the 
pilot symbol matrix S p and the KSP matrix B. By despreading Equation 90 with the multi-user code matrix C d and by 
assuming the FD equalizer matrix g to be known and fixed, we obtain an LS estimate of the multi-user data symbol 
matrix s* : 



50 g^G-Y-C? -F^T 2 -B-C? (95) 

because Cp-C H =0 /xW due to the orthogonality of the pilot and user composite code sequences at each symbol block 
instant. Substituting s, into the original LS problem of Equation 90 leads to 

55 

G-Y-P^-F^-^.S^C^+li-B.p^aO (96) 
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where the projection matrix P d is defined as : 

P</=»/*-C"Cc (97) 

5 

[0131] Equation 96 can be interpreted as follows. Both the FD output matrix y and the KSP matrix B are projected 
on the orthogonal complement of the subspace spanned by the multi-user code matrix employing the projection 
matrix P & The equalized FD output matrix after projecting 

10 _ _ 

GYP, 

should then be as close as possible in a Least Squares sense to the FD version of the pilot chip matrix S p *C p padded 
with the KSP matrix after projection B-P d . 

15 

D.2.d User-specific detection 

[0132] As shown in Figure 10, the obtained FD chip equalizer matrix c, whether KSP-trained, joint CDMP/KSP- 
trained or semi-blind joint CDMP/KSP-trained, may subsequently be used to extract the desired user's data symbol 
20 matrix : 



S u = ^I^k'^X< (98) 

25 , * , 

X 

where the estimated FD input matrix * is transformed to the TD by the IFFT matrix F H anurias its known symbol postfix 
removed by the ZP transmit matrix T t . The resulting estimate of the TD input malrix X is finally despread with the 
30 desired user's code matrix C u to obtain an estimate of the desired user's data symbol matrix. 

D.3 Conclusion 

[0133] In this section, three new direct equalizer estimation methods have been developped for single-carrier block 
35 transmission (SCBT) DS-CDMA with Known Symbol Padding (KSP). The KSP-trained method, that only exploits the 
presence of the known symbol postfix, only achieves reasonable performance for rather large burst lengths. The joint 
CDMP/KSP-trained method, that additionally exploits the presence of a Code Division Multiplexed Pilot (CDMP), out- 
performs the KSP-trained method for both large and small burst lengths. The semi-blind joint CDMP/KSP-trained meth- 
od, that additionally capitalizes on the unused spreading codes in a practical CDMA system (assuming knowledge of 
40 the multi-user code correlation matrix) proves its usefulness especially for small burst lengths. It outperforms the regular 
joint CDMP/KSP-trained method while staying within reasonable range of the ideal fully-trained method. 
[0134] One can conclude that, from a performance point of view, the semi-blind joint CDMP/KSP-trained method for 
direct equalizer estimation is an interesting technique for future broadband cellular systems based on SCBT-DS-CDMA. 

45 

Claims 



1. In a communication system having at least one base station and at least one terminal, a method for multi-user 
wireless transmission of data signals, comprising, for a plurality of users, the following steps : 

- adding robustness to frequency-selective fading to said data to be transmitted, 

- performing spreading and scrambling of at least a portion of a block of data, obtainable by grouping data 
symbols by demultiplexing using a serial-to-parallel operation, 

combining (summing) spread and scrambled portions of said blocks of at least two users, 
adding transmit redundancy to said combined spread and scrambled portions, and 

- transmitting said combined spread and scrambled portions with transmit redundancy. 
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2. The method recited in 1 , wherein said spreading and scrambling is performed by a code sequence, obtained by 
multiplying a user(terminal)-specific code and a base station specific scrambling code. 

3. The method recited in 1 , wherein preceding the steps of claim 1 , the step is performed of generating a plurality of 
s independent block portions. 

4. The method recited in 1 , wherein preceding the steps of claim 1 , the step is performed of generating block portions. 

5. The method recited in 3, wherein all the steps are performed as many times as there are block portions, thereby 
10 generating streams comprising a plurality of combined spread and scrambled block portions. 

6. The method recited in 5, wherein between the step of combining and the step of transmitting said spread and 
scrambled portions, the step is comprised of encoding each of said streams. 

15 7. The method recited in 5, wherein between the step of combining and the step of transmitting said spread and 
scrambled portions, the step is comprised of space-time encoding said streams, thereby combining info from at 
least two of said streams. 

8. The method recited in 7, wherein said step of space-time encoding said streams is performed by block space-time 
20 encoding or trellis space-time encoding. 

9. The method recited in 1 , wherein between the step of combining and the step of transmitting said spread and 
scrambled portions, the step of inverse subband processing is comprised. 

25 10. The method recited in 1, wherein the step of adding robustness to frequency-selective fading is performed by 
adding linear precoding. 

11. The method recited in 1, wherein the step of adding robustness to frequency-selective fading is performed by 
applying adaptive loading per user. 

30 

12. The method recited in 1 , wherein the step of combining spread and scrambled block portions includes the summing 
of a pilot signal. 

13. The method recited in 1 , wherein the step of adding transmit redundancy comprises the addition of a cyclic prefix, 
35 a zero postfix or a symbol postfix. 

14. A transmit system device for wireless multi-user communication, applying the method recited in 1 . 

15. A transmit apparatus for wireless multi-user communication, comprising : 

40 

Circuitry for grouping data symbols to be transmitted, 

Means for applying a spreading and scrambling operation to said grouped data symbols, 
Circuitry for add transmit redundancy to said spread and scrambled grouped data symbols, and 

- At least one transmit antenna for transmitting said spread and scrambled grouped data symbols with transmit 
45 redundancy. 

1 6. The transmit apparatus recited in 1 5, further comprising means for adding robustness to frequency-selective fading 
to said grouped data symbols. 

so 17. The transmit apparatus recited in 15, further comprising a space-time encoder. 

1 8. The transmit apparatus recited in 1 5, further comprising one or more circuits adapted for inverse subband process- 
ing said grouped data symbols. 

55 1 9. A method for receiving at least one signal in a multi-user wireless communication system having at least one base 
station and at least one terminal, comprising the steps of 

- Receiving a signal from at least one antenna, 
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- Subband processing of a version of said received signal, 
Separating the contributions of the various users in said received signal 

Exploiting the additional robustness to frequency-selective fading property of said received signal. 

20. The method recited in 19, wherein the step of separating the contributions consists in first filtering at chip rate at 
least a portion of the subband processed version of said received signal and then despreading. 

21. The method recited in 19, wherein the step of separating the contributions consists in first despreading and then 
filtering at least a portion of the subband processed version of said received signal. 

22. The method recited in 19, wherein the step of receiving a signal is performed for a plurality of antennas, thereby 
generating data streams and wherein the step of subband processing is performed on each of said data streams, 
yielding a subband processed version of said received signal. 

23. The method recited in 1 9, wherein the additional step of space-time decoding is performed on each of the streams. 

24. The method recited in 23, wherein the step of space-time decoding is performed by block decoding or trellis de- 
coding. 

25. The method recited in 20 or 21 , wherein the additional step of inverse subband processing is performed on at least 
one filtered, subband processed version of the received signal. 

26. The method recited in 20 or 21 , wherein the step of filtering is carried out by a filter of which the filter coefficients 
are determined in a semi-blind fashion or in a training-based way. 

27. The method recited in 20 or 21 , wherein the step of filtering is carried out by a filter of which the filter coefficients 
are determined without channel estimation. 

28. The method recited in 20 or 21 , wherein the step of filtering is carried out by a filter of which the filter coefficients 
are determined such that one version of the filtered signal is as close as possible to a version of the pilot symbol. 

29. The method recited in 28, wherein the version of the filtered signal is the filtered signal after despreading with a 
composite code of the base station specific scrambling code and the pilot code and wherein the version of the 
pilot symbol is the pilot symbol itself, put in per tone ordering. 

30. The method recited in 28, wherein the version of the filtered signal is the filtered signal after projecting on the 
orthogonal complement on the subspace spanned by the composite codes of the base station specific scrambling 
code and the user specific codes, and wherein the version of the pilot symbol is the pilot symbol spread with a 
composite code of the base station specific scrambling code and the pilot code, and put in per tone ordering. 

31. The method recited in 19, wherein the additional step of removing transmit redundancy is performed. 

32. The method recited in 19, wherein said additional robustness to fading is exploited by linear de-precoding. 

33. A receive system device for wireless multi-user communication, applying the method recited in 19. 

34. A receiver apparatus for wireless multi-user communication, comprising : 

A plurality of antennas receiving signals, 

- A plurality of circuits adapted for subband processing of said received signals, 

Circuitry being adapted for determining by despreading an estimate of subband processed symbols received 
by at least one user. 

35. The apparatus recited in 34, wherein said circuitry adapted for determining an estimate of symbols comprises a 
plurality of circuits for inverse subband processing. 

36. The apparatus recited in 34, wherein said circuitry adapted for determining an estimate of symbols further com- 
prises a plurality of filters to filter at least a portion of a subband processed version of said received signals. 
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37. The apparatus recited in 34, wherein said circuitry adapted for determining an estimate of symbols further com- 
prises a plurality of filters to filter at chip rate at least a portion of a subband processed version of said received 
signals. 

s 38. The apparatus recited in 34, further comprising a space-time decoder. 
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